Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Algorithms: Extended Abstract
نویسنده
چکیده
The celebrated multi-armed bandit (MAB) problem, originating from the work of Gittins et al. [GGW89], presumes a condition on the arms called the martingale assumption. Recently, A. Gupta et al. obtained an LP-based 1 48 -approximation for the problem with the martingale assumption removed [GKMR11]. We improve the algorithm to a 4 27 -approximation, with simpler analysis. Our algorithm also generalizes to the case of MAB superprocesses with (stochastic) multi-period actions. This generalization captures the explore-exploit budgeted learning framework introduced by Guha and Munagala [GM07a, GM07b]. Also, we obtain a tight ( 12 − ε)-approximation for the variant where preemption (playing an arm, switching to another arm, then coming back to the first arm) is not allowed. This contains the stochastic knapsack problem of Dean, Goemans, and Vondrák [DGV08] with correlated rewards, for both the cancellation and no cancellation cases, improving the 1 16 and 1 8 approximations of [GKMR11], respectively. Our algorithm samples probabilities from an exponential-sized dynamic programming solution, whose existence is guaranteed by an LP projection argument. We hope this technique can also be applied to other dynamic programming problems which can be projected down onto a small LP. ∗willma353@gmail.com, Operations Research Center, Massachusetts Institute of Technology. Supported in part by the NSERC PGS-D Award, NSF grant CCF-1115849, and ONR grants N00014-11-1-0053 and N00014-11-1-0056.
منابع مشابه
Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Approximation Algorithms: Full Version
The multi-armed bandit (MAB) problem features the classical tradeoff between exploration and exploitation. The input specifies several stochastic arms which evolve with each pull, and the goal is to maximize the expected reward after a fixed budget of pulls. The celebrated work of Gittins et al. [GGW89] presumes a condition on the arms called the martingale assumption. Recently, A. Gupta et al....
متن کاملTime-Constrained Restless Bandits and the Knapsack Problem for Perishable Items (Extended Abstract)
Motivated by a food promotion problem, we introduce the Knapsack Problem for Perishable Items (KPPI) to address a dynamic problem of optimally filling a knapsack with items that disappear randomly. The KPPI naturally bridges the gap and elucidates the relation between the pspace-hard restless bandit problem and the np-hard knapsack problem. Our main result is a problem decomposition method resu...
متن کاملRegret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem
This paper is devoted to regret lower bounds in the classical model of stochastic multiarmed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existen...
متن کاملLower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistency, and exhibit a generalisation of the bound. We also study the existence of logarith...
متن کاملMulti-armed Bandit Problem with Lock-up Periods
We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem, their use under restricted conditions is not obvious. We extend ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013